Data Sorting Method And System

ABSTRACT

In accordance with the teachings described herein, systems and methods are provided for data sorting. A method for use with one or more processing devices in order to merge sorted runs of data may include the steps of: defining a plurality of floating buffers; calculating a number of data blocks for each floating buffer; configuring the floating buffers to store the number of data blocks; and using the floating buffers to perform an external data sorting operation. A data sorting system may include one or more programs, and may be used with a plurality of floating buffers and a data storage device for storing a plurality of sorted runs of data blocks, each data block including a plurality of data records. The one or more programs in a data sorting system may be operable to calculate a number of data blocks for each floating buffer and configure the plurality of floating buffers to store the number of data blocks. In addition, the one or more programs in a data sorting system may be further operable to sort the plurality of data records into a single sorted output using the plurality of floating buffers.

This is a continuation of U.S. patent application Ser. No. 10/983,745,filed on Nov. 8, 2004, the entirety of which is incorporated herein byreference.

FIELD

The technology described in this patent document relates generally todata sorting. More specifically, this document describes systems andmethods for sorting data using a multi-block floating buffer.

BACKGROUND

Sorting is the process of ordering items based on specified criteria. Indata processing, sorting indicates the sequencing of records using a keyvalue determined for each record. If a group of records is too large tobe sorted within available random access memory, then a two-phaseprocess, referred to as external sorting, may be used. In the firstphase of the external sorting process, a portion of the records istypically sorted and the partial result, referred to as a sorted run, isstored to temporary external storage. Sorted runs are generated untilthe entire group of records is exhausted. Then, in the second phase ofthe external sorting process, the sorted runs are merged, typically to afinal output record group. If all of the sorted runs cannot be merged inone pass, then the second phase may be executed multiple times in aprocess commonly referred to as a multi-pass or multi-phase merge. In amulti-phase merge, existing runs are merged to create a new, smallerreplacement set of runs.

The records within a sorted run are typically written to externalstorage in sequential blocks of data, such that each block includes anintegral number of records. The performance of typical merging andforecasting algorithms can be greatly affected by the size of the recordblock. For example, when sorting randomly ordered records, poor mergeperformance may result from the selection of small block size becausedisk latency, which may be orders of magnitude larger than any otherdelay (e.g., memory access latency) encountered during merging, candominate processing time. One method of increasing merge performance isto establish a large block size so that access costs (i.e., time spentlocating the blocks) are insignificant compared to transfer costs (i.e.,time spent reading the blocks.) However, a large block size may alsodecrease performance by resulting in a multi-pass merge and,consequently, increased processing time and increased temporary storagerequirements.

Another method for increasing performance during the merge phase is toeliminate time spent stalled on input (i.e., waiting for a record blockto be retrieved from external storage) by reading blocks from storage inadvance of their need while the merge is in progress. One algorithm usedto achieve such parallelism is referred to as forecasting with floatingbuffers. This forecasting algorithm, designed to execute concurrentlywith the merge algorithm, reads blocks in the same sequence that themerge algorithm requires them. A typical forecasting algorithmdetermines which run to read next by comparing the largest key value ofthe last block read from each run being merged. The run associated withthe smallest such key is the run from which the next block is read. Thebuffers, into which blocks are read, may be used to read data from anyrun, and are thus said to float among the runs.

SUMMARY

In accordance with the teachings described herein, systems and methodsare provided for data sorting. A method for use with one or moreprocessing devices in order to merge sorted runs of data may include thesteps of: defining a plurality of floating buffers; calculating a numberof data blocks for each floating buffer; configuring the floatingbuffers to store the number of data blocks; and using the floatingbuffers to perform an external data sorting operation. A data sortingsystem may include one or more programs, and may be used with aplurality of floating buffers and a data storage device for storing aplurality of sorted runs of data blocks, each data block including aplurality of data records. The one or more programs in a data sortingsystem may be operable to calculate a number of data blocks for eachfloating buffer and configure the plurality of floating buffers to storethe number of data blocks. In addition, the one or more programs in adata sorting system may be further operable to sort the plurality ofdata records into a single sorted output run using the plurality offloating buffers. In other examples, the number of data blocks in theplurality of floating buffers may be pre-determined or may be calculatedby a different system using the techniques described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example data sorting system forperforming an external sorting operation.

FIG. 2 is a block diagram of an example data sorting system showingsorted runs.

FIGS. 3A-3C illustrate an example operation of a data sorting system.

FIG. 4 is a flow diagram illustrating an example operational scenariofor merging sorted runs.

FIG. 5 is a flow diagram illustrating an example operational scenarioinvolving forecasting.

FIG. 6 is a block diagram illustrating an example data structure for amulti-block floating buffer.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of an example data sorting system forperforming an external sorting operation. The system includes acomputing machine 101, a temporary data storage device 102 and apermanent data storage device 103. The computing machine 101 may includea multiprocessor computer architecture having two independent centralprocessing units (CPUs) 105, 106 that share access to a common randomaccess memory (RAM) 104. However, it should be understood that differentcomputing devices may be used, such as computing devices that have asingle processing device.

The temporary and permanent data storage devices 102, 103 are directaccess storage devices accessible by the computing machine 101. Thetemporary data storage device 102 is used to store a plurality of sortedruns of data blocks, such as, but not limited to, in a single computerfile. The sorted runs are merged (e.g., into a single sorted computerfile) and the sorted output is stored in the permanent data storagedevice 103. The external sorting process may be performed by aforecasting program and a merge program, which may, for example, beindependently executed as two separate threads of execution that arerespectively executed by the multiple CPUs 105, 106 of the computingmachine 101. It should be understood, however, that the forecasting andmerge programs may be implemented in different ways, such as using asingle processing device. In addition, the sorted runs and sorted outputmay be stored in other types of memory devices and/or memoryconfigurations, which may be either external or internal to thecomputing machine 101.

FIG. 2 depicts multiple sorted runs 220 for processing by a data sortingsystem. The data sorting system includes a computing machine 201, afirst data storage device 202 for storing sorted runs 220 of data blocks222, and a second data storage device 203 for storing sorted output.However it should be understood that other configurations can be used,such as the sorted runs 220 and the sorted output 203 may be stored in asingle data storage device.

The sorted runs 220 can be stored in a single file within the first datastorage device 202. The sorted runs 220 include a plurality of datablocks 222, and the data blocks 222 store a plurality of data records224. The data records 224 may include both record data and key values.The key value associated with a particular data record 224 may beextracted from or included with the data record 224. The record data 224within each sorted run 220 is sorted according to the keys (e.g., inascending or descending order). An example of a sorted run showingsorted record keys is described below with reference to FIG. 3A.

The computing machine 201 includes a forecasting program 226, a mergeprogram 228, and a plurality of floating buffers 230. The forecastingprogram 226 and merge program 228 are stored in a memory location in thecomputing machine 201 and are executed by one or more processing devicesin the computing machine 201 to perform an external sorting operation.For example, the forecasting program 226 and merge program 228 may betwo separate threads of execution that are respectively executed bymultiple CPUs 105, 106 of the computing machine 101, as described abovewith reference to FIG. 1.

The floating buffers 230 may, for example, be memory locations definedin one or more memory devices on the computing machine 201. Eachfloating buffer 231 is configurable to store a plurality of data blocks232. The number of data blocks 232 in a floating buffer 231 may, forexample, be configured by the merge program 228, as described below.However, other approaches can be used, such as the number of data blocks232 in the floating buffers 231 may be determined by a program otherthan the merge program 228 or may be determined by user input.

In operation, the data blocks 222 from the sorted runs 220 are read intothe floating buffers 230 and are merged by the merge program 228 togenerate the sorted output 203. As the merge program 228 progresses,depleted buffers 230 are passed to the forecasting program 226 to berepopulated from the sorted runs 220. The forecasting program 226 beginsits operation by attempting to acquire one or more empty buffers 230but, failing that, may either allocate a new floating buffer 231, ifmemory limits permit, or wait until a buffer 231 becomes available fromthe merge program 228. The forecasting program 226 then reads datablocks 222 from the sorted runs 220 into one or more floating buffers230 in advance of their need by the concurrently executing merge program228.

In order for the forecasting program 226 to operate concurrently (ifdesired) with the merge program 228, one or more floating buffers 230should be available beyond the number required to perform the mergeoperation. If no extra buffers 230 are available, then the forecastingprogram 226 waits for the merge program 228 to deplete a buffer 231 andpass the empty buffer 231 to the forecasting program 226 so that it mayinitiate a read request. With one extra floating buffer 231, theforecasting program 226 can determine which data block 222 will beneeded next by the merge program 228, and initiate a read request whilethe runs are being merged. Upon completing the read request, theforecasting program 226 passes the newly populated floating buffer tothe merge program 228 as needed, and waits for another empty buffer 231.

In some instances, two or more floating buffers may near depletionaround the same time during a merge. In this case, a single extrafloating buffer 231 may not suffice for full parallelism because themerge program 228 will attempt to retrieve two or more filledreplacement buffers in quick succession from the forecasting program226. The first attempt by the merge program 228 to retrieve areplacement buffer will succeed if the forecasting program 226 hasalready filled the one extra buffer 231. However, subsequent attempts toretrieve a replacement buffer may stall because, without additionalbuffers, replacements cannot be produced in advance. The likelihood ofthis occurrence depends upon the initial order of the input records tobe sorted. For records that are already ordered, there is usually nochance that two or more buffers will near depletion together because themerge program 228 will draw records sequentially from each buffer 231,read blocks sequentially from each run 220, and tap the runs 220 insequence. However, for records that are randomly ordered, it may occurthat all of the buffers will near depletion together because the mergeprogram 228 will draw records alternately and with roughly equalprobability from among the buffers 230. This latter situation results ina near immediate need for replacement buffers by the merge program 228for every run 220 being merged. Thus, to facilitate full parallelismbetween the forecasting program 226 and the merge program 228, the totalnumber of floating buffers 230 allocated to the forecasting and mergeprograms 226, 228 should be twice the number of runs 220 being merged.Though not necessary to achieve full parallelism, additional floatingbuffers 230 beyond this number may be useful in maintaining a steadyflow of data to the merge program 228 and smoothing the burst of inputdemands on the operating system and hardware in a multi-user ormulti-tasking system.

In addition, as noted above, the number of blocks 232 available in eachfloating buffer 231 will affect the access costs of the sorting system.The following mathematical process may be used to determine the optimumnumber of blocks 232 for each floating buffer 231. This mathematicalprocess may also be used to determine the amount of memory needed forthe floating buffers 230 in order to achieve a desired reduction in diskaccess costs and full parallelism between the forecasting program 226and the merge program 228.

As shown by the example mathematical process below, when retrieving ablock from a sorted run, the retrieval of additional, logicallysuccessive blocks from the run can reduce access costs by avoiding thedisk latency often associated with locating the additional blocks.Access costs can be reduced, therefore, by effectively increasing thenumber of blocks read per access instead of increasing the block size.For arbitrary block sizes, the performance benefits afforded byforecasting can be attained by allowing the forecasting algorithm tosupport input into multi-block floating buffers and ensuring that accesscosts are acceptably reduced by adjusting the number of blocks perfloating buffer. This scheme may also provide flexibility in the mergephase by allowing some control over the number of passes required tocomplete the external sort because the number of blocks per floatingbuffer can be adjusted to allow for a greater or fewer number of buffersand, consequently, directly affect the number of runs that may be mergedin a single pass.

The time required to read data from storage 202 dictates the speed atwhich sorted runs 220 may be merged, assuming no disk latency costs.This time is equivalent to the time required to read the entire filesequentially and can be calculated as follows:

${t = \frac{F}{\rho}},$

where t is the I/O time, F is the file size (bytes), and p is the disktransfer rate (bytes/second).

If we assume that the file size (F) is some integer multiple of theblock size (B), such that F=n_(blocks)×B, where B is the block 222 size(bytes) and n_(blocks) is the number of blocks in the file, then theequation becomes:

$t = {n_{blocks} \times {\frac{B}{\rho}.}}$

Disk latency (L) is the sum of the positional latency (s) and therotational latency (r). Assuming a non-zero disk latency (L) and thataccess costs are encountered for every block, then the time required toread the file may be expressed as follows:

$t = {n_{blocks} \times {\left( {\frac{B}{\rho} + L} \right).}}$

Thus, the percentage of time spent locating blocks may be expressed asfollows:

${q = {\frac{n_{blocks} \times L}{n_{blocks} \times \left( {\frac{B}{\rho} + L} \right)} \times 100}};$

which simplifies to:

$q = {\frac{L}{\left( {\frac{B}{\rho} + L} \right)} \times 100.}$

The percentage of time spent locating blocks is not dependent upon thenumber of blocks in the file and, therefore, is independent of the filesize. The total time spent locating blocks may be reduced by reading notonly the next block that is required, but also reading one or moresubsequent blocks from the same run 220. Because the additional blocksare likely to be adjacent to the first located block, there should be nolocation costs incurred to read the additional blocks. If the number ofblocks read are increased by a factor of N for each disk access, where Nis the number of blocks in each buffer 231, the disk latency percentagemay be reduced to:

$q = {\frac{L}{\left( {\frac{NB}{\rho} + L} \right)} \times 100.}$

The above equation may be used to establish a cap (Q) on the amount oftime spent locating blocks, as follows:

${{\frac{L}{\left( {\frac{NB}{\rho} + L} \right)} \times 100} \leq Q},$

where Q is the desired maximum percentage of total time consumed by disklatency. The number of blocks per floating buffer (N) that will satisfythis relation may then be established, as follows:

$N \geq {\frac{\rho \; {L\left( {\frac{100}{Q} - 1} \right)}}{B}.}$

That is, any number of blocks per floating buffer (N) that is greaterthan or equal to

$\frac{\rho \; {L\left( {\frac{100}{Q} - 1} \right)}}{B}$

will cause the latency, as a percentage of total input time, to bereduced to Q or below. Thus, the smallest number of blocks per floatingbuffer (N) that will produce the desired reduction of latency may beexpressed as:

$N = {\left\lceil \frac{\rho \; {L\left( {\frac{100}{Q} - 1} \right)}}{B} \right\rceil.}$

As discussed above, the number of available floating buffers 230 neededto achieve full parallelism between the forecasting program 226 and themerge program 228 is equal to twice the number of runs to be merged(n_(runs)). Thus, the memory requirements to satisfy latency reductionand to provide full parallelism may be expressed as:

M=(2×n _(runs))×(N×B).

The following example examines the Hitachi Deskstar 180GXP hard diskdrive (Model IC35L180AVV207-I), which has the following specifications.

Hitachi Deskstar 180GXP Hard Drive Specifications Specification ValueLatency average (ms) 4.17 Average seek time (ms) 8.80 Sustained datarate (MB/sec) 29 to 56

The latency average in the above hard drive specifications refers toaverage rotational latency, which is the amount of time required for thedisk to complete half a rotation. The Hitachi Deskstar 180×P rotates at7200 rpm, yielding an average rotational latency (r) of 4.17 ms. Theaverage seek time is also referred to as average positional latency (s).Using the smaller value of the specified sustained data rate range, 29MB/s, yields a transfer rate (ρ) of 30,408,704 bytes per second (with 1MB=2²⁰ bytes). The derived parameters of the Hitachi Deskstar 180GXPhard drive are as follows:

Hitachi Deskstar 180GXP Derived Hardware Parameters Parameter ValuePositional latency (s) 0.00417 Rotational Latency (r) 0.00880 DiskLatency (L = s + r) 0.01297 Disk transfer rate (ρ) 30,408,704

For the purposes of this example, assume that a 128 MB data set is beingsorted using 32 MB of random access memory. Further assume that thefirst phase of the external sorting process (e.g., the creation of thesorted runs) resulted in 4 sorted runs of 64 KB blocks. Then, if apercentage latency cap, Q, of 10% is desired, the sorting dimensions areas follows.

Sorting Dimensions Dimension Value File size (bytes) (F) 134,217,728Block size (bytes) (B) 65,536 Number of sorted runs (n_(runs)) 4Percentage latency cap (Q) 10%

Using these parameters, the number of blocks 232 per floating buffer 231that will satisfy a 10% latency cap (Q) may be determined, as follows.

$N = {\left\lceil \frac{\rho \; {L\left( {\frac{100}{Q} - 1} \right)}}{B} \right\rceil = {\left\lceil \frac{30408704 \times 0.01297\left( {\frac{100}{10} - 1} \right)}{65536} \right\rceil = 55.}}$

With reference now to FIGS. 3A-3C, a block diagram is provided toillustrate an example operation of a data sorting system. FIG. 3Aillustrates an example of sorted runs stored in a data storage device.FIG. 3B illustrates the operation of a forecasting program and a mergeprogram during a merge. FIG. 3C illustrates the operation of aforecasting program to replace floating buffers depleted by a mergeprogram during the merge. This example illustrates just one of many waysto implement a merge program and a forecast program. In this example,merge operations and forecast operations are respectively performed by amerge thread 302 and a forecast thread 300 in a multi-threadenvironment.

With reference first to FIG. 3A, a stored file is illustrated thatincludes five sorted runs 320 (R-1 through R-5). The sorted runs 320include a plurality of data blocks 322 that store the sorted datarecords. The range of key values for the data records 324 stored in eachof the data blocks 322 are shown in FIG. 3A. For example, the first datablock 322 in the first sorted run (R-1) includes sorted data records 324having key values starting at 35 and ending at 80.

With reference now to FIG. 3B, a merge program loads blocks 322 fromeach run 320 into an initial set of floating buffers 330. The floatingbuffers 330 in this example each include three blocks 332. Thus, thefirst three blocks 322 from each run 320 are loaded into an initial setof buffers (labeled a-e). The blocks of data 332 within a buffer arereferred to herein as a block tuple. For convenience, only the last keyvalue in each block tuple is shown in the diagram. Once the initial setof buffers (a-e) is loaded, the merge thread 302 begins comparing thedata records in the buffers (a-e) to generate the sorted output 303.

Concurrent with the operation of the merge program, the forecastingthread 300 acquires floating buffers (g and f) that are not in use bythe merge thread 302. As discussed above, for full parallelism betweenthe merge thread 302 and forecasting thread 300 there can be an equalnumber of floating buffers available to the forecasting thread 300.However, for the purposes of this example, only two additional floatingbuffers (g and f) are available. The forecasting thread 300 determinesthe order in which the buffers (a-e) being used by the merge thread 302will be depleted, and assigns the additional buffers (g and f) based onthis determination. The order of buffer depletion by the merge programmay be determined by comparing the last key in each block tuple.

In the illustrated example, the last key in floating buffer “c” has thelowest value (108). Thus, the forecasting thread 300 determines thatfloating buffer “c” will be depleted first and assigns the additionalbuffer “f” to the third sorted run (R-3). The forecasting thread 300then initiates a read request to fill the additional buffers “f” fromthe next three data blocks 322 from sorted run “R-3,” having a key valuerange of 232 through 362. Similarly, the additional buffer “g” is nextassigned to and filled from the first sorted run (R-1), which has thesecond lowest value (188) in its last key position.

Once the merge thread 302 has depleted a buffer 330, the depleted bufferis passed to the forecasting thread 300 and is replaced with a populatedbuffer 330, as illustrated in FIG. 3C. In FIG. 3C, the forecastingthread 300 has replaced depleted floating buffer “c” with populatedbuffer “f.” The forecasting thread 300 has then reassigned the depletedbuffer “c” to sorted run “R-4,” which has the next lowest last key value(199), and repopulated buffer “c” with the next three data blocks 322from the run 320.

FIG. 4 is a flow diagram illustrating an example method of mergingsorted runs, which may be performed by a merge program. The methodbegins at step 400. At step 401, the method determines the number ofdata blocks in the floating buffers. This step 401 may, for example, beperformed using the calculations described above with reference to FIG.2. After the buffers have been sized, the initial set of buffers (onebuffer for each run being merged) is allocated and filled from thestored sorted runs at step 402. Once the initial buffers are filled, aforecasting method is initialized at step 403. An example forecastingmethod is described below with reference to FIG. 5.

Starting at step 404, the method proceeds into its primary loop ofexecution. At step 404, the method determines which record should nextbe merged from the buffers to the sorted output. When the next record tobe merged is identified, the record is copied from the buffer to thesorted output file at step 405. Then, at step 406, the method determinesif there are more records to be merged from the same buffer. If thebuffer includes more records to be merged, then the method advances tothe next record in the buffer at step 408 and proceeds to step 412.Else, if there are no more records in the buffer, then the methodproceeds to step 407.

At step 407, the method determines if there are more blocks to be mergedfrom the same sorted run. If not, then the run is removed from the mergeprocess (step 410) and the method proceeds to step 412. If there aremore blocks to be merged from the run, however, then the depleted bufferis released to the forecasting process at step 409, a filled buffer forthe same sorted run is obtained from the forecasting process at step411, and the method proceeds to step 412.

At step 412, the method determines if the records in all of the sortedruns have been merged. If not, then the method returns to step 404 torepeat the primary loop of execution. Otherwise, if there are noadditional records to be merged, then the method waits for theforecasting method to terminate at step 413, and the method ends at step414.

FIG. 5 is a flow diagram illustrating an example forecasting method. Themethod begins at step 500. At step 501, the method determines if thereare any further blocks to read in the sorted runs. If not, then themethod ends at step 511. Otherwise, if there are additional blocks to beread, then the method determines at step 502 whether there are any emptyfloating buffers available. If there are no floating buffers available,then the method proceeds to step 503. Else, if an empty floating bufferis available, then the method proceeds to step 507.

At step 503, the method determines if the total amount of memoryallocated for the floating buffers is below a selected limit. If theallocated memory is below the selected limit, then the method attemptsto allocate an additional floating buffer at step 504. If successful inallocating the additional floating buffer (step 505), then the methodproceeds to step 507. However, if either the allocation limit has beenreached (step 503) or the method is not successful in allocating anadditional floating buffer (step 505), then the method waits for adepleted buffer to be passed from the merging method at step 506, andproceeds to step 507 when a buffer is available.

Once a free floating buffer is acquired, the method forecasts which datablocks from the stored sorted runs will next be required by the mergingmethod and reads those blocks into the free buffer at step 507. Thisforecasting step (step 507) may be performed by inspecting the lastrecord of each sorted run that is currently buffered. The run associatedwith the record having the smallest key value for its last bufferedrecord is the sorted run from which a new range of data blocks should beread next. Once the next block tuple is read into a floating buffer atstep 507, the method determines at step 508 if there are any unreadblocks remaining in the same sorted run. If unread blocks remain in thesorted run, then the method proceeds to step 510. Else, if there are noadditional blocks to be read from the sorted run, then the run isremoved from the forecasting process at step 509, and the methodproceeds to step 510. At step 510, the newly filled buffer istransferred to the merging method, and the method returns to step 501.

It should be understood that similar to the other processing flowsdescribed herein, the steps and the order of the steps in the flowchartdescribed herein may be altered, modified, deleted and/or augmented andstill achieve the desired outcome.

This written description uses examples to disclose the invention,including the best mode, and also to enable a person skilled in the artto make and use the invention. The patentable scope of the invention mayinclude other examples that occur to those skilled in the art. As anexample of the wide scope of the systems and methods disclosed herein,various data structures can be used. FIG. 6 is a block diagramillustrating an example data structure for a multi-block floating buffer600. The floating buffer 600 includes a buffer data structure 602 andtwo separately allocated memory spaces 604, 606: a first memory space604 to store pairs of record keys 603 and associated record pointers 605(referred to herein as the “key space” or a “key memory location”); anda second memory space 606 to store the record blocks (referred to hereinas the “record space” or “record memory location”). Also illustrated area plurality of run descriptor data structures 608.

After a sorted run is created during the first stage of an externalsorting process, the attributes of the sorted run are recorded in a rundescriptor data structure 608. Run descriptor data structures 608 arechained together (via the “next” field) to form a list, referred toherein as a “run list.” The run list is carried into the second (merge)phase of the external sorting process and is used by the merging andforecasting programs to track which data blocks have been retrieved fromstorage.

With reference to the floating buffer 600, the record space 606 is sizedto hold one or more record blocks 610 and the key space 604 is sized tohold a corresponding number of keys 603. In this particular example,both the record and key spaces 604, 606 are monolithic allocations(e.g., the memory within a space is contiguous). After blocks are readfrom storage into a buffer, key values (on which records are to beordered) are projected from the records into the key space 604 and, foreach key 603, a record pointer 605 is set to indicate the source record612.

After a buffer is loaded, values are set in the buffer data structure602 for a current key pointer 614, a start block identifier 616, a totalblock identifier 618, a total records identifier, 620, a disk runpointer 622 and a more blocks identifier 624. The current key pointer614 indicates the smallest (top) key 603 in the buffer. The start blockidentifier 616 indicates the first block loaded into the buffer. Thetotal block identifier 618 indicates the number of blocks loaded intothe buffer. The total records identifier indicates the number of recordsloaded into the buffer. The disk run pointer 622 identifies the rundescriptor data structure 608 for the sorted run from which the blockswere read. The more blocks identifier 624 indicates whether there aremore blocks remaining to be read from the sorted run.

In addition, the run descriptor data structure 608 is also updated aftera buffer is loaded to set values for a high key pointer 626, a totalrecords identifier 628, a total blocks identifier 630 and a next blockidentifier 632. The high key pointer 626 indicates the largest (bottom)key within the key space 604 of the floating buffer 600. The totalrecords identifier 628 indicates the number of records remaining in thesorted run. The total blocks identifier 630 indicates the number ofrecords remaining in the sorted run. The next block identifier 632indicates the next block in the sorted run to be loaded into thefloating buffer 600.

A merge program may maintain a priority queue of floating buffers 600,containing one floating buffer for each sorted run being merged. Thecurrent key pointer 614 in the buffer data structure 602 is used toorder buffers within the queue by comparing the current key value forall of the buffers. Ties may be resolved using the run ordinal 633within the corresponding run descriptor data structure 608. The priorityqueue may provide immediate access to the floating buffer containing thesmallest key value 603 (indicated by the current key pointer value 614for that buffer). The record from which the smallest key valueoriginated is then emitted, the current key pointer 614 is updated topoint to the next key 603 in the key space 604, and the priority queueis adjusted to locate the buffer with the next smallest key value.

A forecasting program may maintain a priority queue of run descriptors608, containing one run descriptor for each run being merged. The highkey pointer 626 is used to order descriptors within the queue bycomparing the highest key value for every participating run that hasbeen buffered. Ties may be resolved using the run ordinal, ensuring thatthe forecasting read sequence is the same as the merge consumptionsequence. The priority queue may provide immediate access to the rundescriptor 608 pointing to the smallest high key value 626. The sortedrun associated with this run descriptor is the sorted run from which thenext record block or blocks should be read. After a buffer is obtainedand filled with the record blocks from the sorted run, the rundescriptor data structure 608 is updated and the priority queue isadjusted to locate the run descriptor which points to the next smallesthigh key value 626.

It is further noted that the systems and methods described herein may beimplemented on various types of computer architectures, such as forexample on a single general purpose computer or workstation, or on anetworked system, or in a client-server configuration, or in anapplication service provider configuration.

It is further noted that the systems and methods may include datasignals conveyed via networks (e.g., local area network, wide areanetwork, internet, etc.), fiber optic medium, carrier waves, wirelessnetworks, etc. for communication with one or more data processingdevices. The data signals can carry any or all of the data disclosedherein that is provided to or from a device.

Additionally, the methods and systems described herein may beimplemented on many different types of processing devices by programcode comprising program instructions that are executable by the deviceprocessing subsystem. The software program instructions may includesource code, object code, machine code, or any other stored data that isoperable to cause a processing system to perform methods describedherein. Other implementations may also be used, however, such asfirmware or even appropriately designed hardware configured to carry outthe methods and systems described herein.

The systems' and methods' data (e.g., associations, mappings, etc.) maybe stored and implemented in one or more different types ofcomputer-implemented ways, such as different types of storage devicesand programming constructs (e.g., data stores, RAM, ROM, Flash memory,flat files, databases, programming data structures, programmingvariables, IF-THEN (or similar type) statement constructs, etc.). It isnoted that data structures describe formats for use in organizing andstoring data in databases, programs, memory, or other computer-readablemedia for use by a computer program.

The systems and methods may be provided on many different types ofcomputer-readable media including computer storage mechanisms (e.g.,CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) thatcontain instructions for use in execution by a processor to perform themethods' operations and implement the systems described herein.

The computer components, software modules, functions, data stores anddata structures described herein may be connected directly or indirectlyto each other in order to allow the flow of data needed for theiroperations. It is also noted that a module or processor includes but isnot limited to a unit of code that performs a software operation, andcan be implemented for example as a subroutine unit of code, or as asoftware function unit of code, or as an object (as in anobject-oriented paradigm), or as an applet, or in a computer scriptlanguage, or as another type of computer code. The software componentsand/or functionality may be located on a single computer or distributedacross multiple computers depending upon the situation at hand.

1. A computer-implemented method for merging sorted runs of data,comprising: comparing data records in a first set of floating buffers togenerate a sorted output; storing the sorted output in a computerreadable medium; copying additional data blocks from a plurality ofsorted runs into a second set of floating buffers before the additionaldata blocks are needed by the comparing step; and replacing depletedfloating buffers from the first set with floating buffers from thesecond set containing the additional data blocks.
 2. Thecomputer-implemented method of claim 1, further comprising: prior to thecomparing step, configuring the first and second sets of floatingbuffers to store a predetermined number of data blocks, wherein thepredetermined number of data blocks is calculated to achieve an optimalnumber of data blocks for each floating buffer.
 3. Thecomputer-implemented method of claim 2, wherein the optimal number ofdata blocks is the number for achieving a selected reduction in disklatency costs.
 4. A data sorting system, comprising: a data store forstoring a plurality of sorted runs of data blocks, each data blockincluding a plurality of data records; a computing device configuredwith a plurality of floating buffers, the plurality of floating buffersincluding a first set of floating buffers and a second set of floatingbuffers; and one or more programs stored in a memory location on thecomputing device and configured to compare data records in the first setof floating buffers to generate a sorted output, copy additional datablocks from the plurality of sorted runs into the second set of floatingbuffers before the additional data blocks are needed in the first set offloating buffers, and replace depleted floating buffers from the firstset of floating buffers with floating buffers from the second set offloating buffers containing the additional data blocks.
 5. The datasorting system of claim 4, wherein the one or more programs are furtherconfigured to define the first and second sets of floating buffers tostore a predetermined number of data blocks, wherein the predeterminednumber of data blocks is calculated to achieve an optimal number of datablocks for each floating buffer.
 6. The data sorting system of claim 5,wherein the optimal number of data blocks is the number needed toachieve a desired reduction in disk latency costs.
 7. A floating bufferfor use with a data sorting system having a computing device and one ormore programs stored in a memory location on the computing device, theone or more programs when executed by the computing device beingoperable to sort data records from a plurality of sorted runs of datablocks into a single sorted output, the floating buffer comprising: arecord memory location configured to store a plurality of data blocks;wherein the one or more programs calculate a number of data blocksstored in the record memory location and configure the record memorylocation to store the number of data blocks.
 8. The floating buffer ofclaim 7, further comprising: a key memory location configured to storerecord key values for associating the data blocks stored in the recordmemory location with a location of the data blocks in the sorted runs ofdata blocks.
 9. The floating buffer of claim 8, further comprising: abuffer data structure configured to identify the data blocks stored inthe record memory location.
 10. The floating buffer of claim 9, whereinthe buffer data structure includes a current key pointer for identifyinga smallest record key value stored in the key memory location.
 11. Thefloating buffer of claim 9, wherein the buffer data structure includes astart block identifier for identifying a first data block loaded intothe record memory location.
 12. The floating buffer of claim 9, whereinthe buffer data structure includes a total block identifier forindicating a total number of data blocks stored in the record memorylocation.
 13. The floating buffer of claim 9, wherein the buffer datastructure includes a total records identifier for indicating a totalnumber of data records stored in the record memory location.
 14. Thefloating buffer of claim 9, wherein a plurality of run descriptor datastructures are stored in a memory location on the computing device, eachrun descriptor data structure identifying the data blocks included inone of the sorted runs of data blocks, and wherein the buffer datastructure includes a disk run pointer for identifying one or the rundescriptor data structures associated with the data blocks stored in therecord memory location.